-
-
Notifications
You must be signed in to change notification settings - Fork 38
Remove separate syntax heads for each operator #575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: kf/dots
Are you sure you want to change the base?
Conversation
Unfortunately, the sequences `..` and `...` do not always refer to the `..` operator or the `...` syntax. There are two and a half cases where they don't: 1. After `@` in macrocall, where they are both regular identifiers 2. In `import ...A` where the dots specify the level 3. `:(...)` treats `...` as quoted identifier Case 1 was handled in a previous commit by lexing these as identifiers after `2`. However, as a result of case 2, it is problematic to tokenize these dots together; we essentially have to untokenize them in the import parser. It is also infeasible to change the lexer to have speical context-sensitive lexing in `import`, because there could be arbitrary interpolations, `@eval import A, $(f(x..y)), ..b`, so deciding whether a particular `..` after import refers to the operator or a level specifier requires the parser. Currently the parser handles this by splitting the obtained tokens again in the import parser, but this is undesirable, because it invalidates the invariant that the tokens produced by the lexer correspond to the non-terminals of the final parse tree. This PR attempts to address this by only ever having the lexer emit `K"."` and having the parser decide which case it refers to. The new non-terminal `K"dots"` handles the identifier cases (ordinary `..` and quoted `:(...)` ). K"..." is now exclusively used for splat/slurp, and is no longer used in its non-terminal form for case 3.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## kf/dots #575 +/- ##
==========================================
Coverage ? 95.41%
==========================================
Files ? 16
Lines ? 4578
Branches ? 0
==========================================
Hits ? 4368
Misses ? 210
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This replaces all the specialized operator heads by a single K"Operator" head that encodes the precedence level in its flags (except for operators that are also used for non-operator purposes). The operators are already K"Identifier" in the final parse tree. There is very little reason to spend all of the extra effort separating them into separate heads only to undo this later. Moreover, I think it's actively misleading, because it makes people think that they can query things about an operator by looking at the head, which doesn't work for suffixed operators. Additionally, this removes the `op=` token, replacing it by two tokens, one K"Operator" with a special precendence level and one `=`. This then removes the last use of `bump_split` (since this PR is on top of #573). As a free bonus this prepares us for having compound assignment syntax for suffixed operators, which was infeasible in the flips parser. That syntax change is not part of this PR but would be trivial (this PR makes it an explicit error). Fixes #334
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I like this a lot in overview and I think the idea is right.
But there's a fair bit to clean up in the implementation and I'm going to be honest, this took an absolute ton of time to review.
One pervasive issue I find confusing in the parser.jl code changes is that isassign
output of peek_dotted_op_token()
is often ignored, but not always. Which cases is this actually ok for? One practical difference between ignoring isassign
vs not is the difference between the following errors:
isassign
checked for:
julia> parsestmt(SyntaxNode, "x |>= y")
ERROR: ParseError:
# Error @ line 1:3
x |>= y
# └┘ ── Compound assignment is not allowed for this operator
vs isassign
not checked for:
julia> parsestmt(SyntaxNode, "x ..= y")
ERROR: ParseError:
# Error @ line 1:5
x ..= y
# ╙ ── unexpected `=`
Was Claude or other AI tool used for the code changes?
I feel we may need a bunch more tests to check that isassign
is used correctly.
@test tok("1+=2", 2).kind == K"Operator" # + before = | ||
@test tok("1+=2", 3).kind == K"=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing multiple tokens with the same input, I suggest toks()
:
@test tok("1+=2", 2).kind == K"Operator" # + before = | |
@test tok("1+=2", 3).kind == K"=" | |
@test toks("1+=2")[2:3] == ["+"=>K"Operator", "="=>K"="] |
(a lot of tests for tokenize.jl were written over time with various test tooling and haven't necessarily been updated to the latest way to do these things)
@@ -1217,5 +653,5 @@ function is_syntactic_operator(x) | |||
# in the parser? The lexer itself usually disallows such tokens, so it's | |||
# not clear whether we need to handle them. (Though note `.->` is a | |||
# token...) | |||
return k in KSet"&& || . ... ->" || is_syntactic_assignment(k) | |||
return k in KSet"&& || . ... -> = :=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change we now have
julia> JuliaSyntax.is_syntactic_operator(K".=")
false
Whereas it used to be true. Was this intentional?
function is_plain_equals(t) | ||
kind(t) == K"=" && !is_suffixed(t) | ||
end | ||
is_plain_equals(t) = kind(t) == K"=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this function, it's only used in two places and the test is now trivial.
src/julia/tokenize.jl
Outdated
""" | ||
emit(l::Lexer, kind::Kind) | ||
|
||
Returns a `RawToken` of kind `kind` with contents `str` and starts a new `RawToken`. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
emit(l::Lexer, kind::Kind) | |
Returns a `RawToken` of kind `kind` with contents `str` and starts a new `RawToken`. | |
""" |
wrong docstring
@@ -608,14 +600,18 @@ function parse_assignment_with_initial_ex(ps::ParseState, mark, down::T) where { | |||
# a += b ==> (+= a b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs fixing I guess (or delete it because it's covered in the if-else below)
emit(ps, mark, leading_dot ? K".op=" : K"op=") | ||
if check_identifiers | ||
# += ==> (error (op= +)) | ||
# .+= ==> (error (. (op= +))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct
# .+= ==> (error (. (op= +))) | |
# .+= ==> (error (.op= +)) |
@@ -76,6 +76,8 @@ tests = [ | |||
"f(x) where S where U = 1" => "(function-= (where (where (call f x) S) U) 1)" | |||
"(f(x)::T) where S = 1" => "(function-= (where (parens (::-i (call f x) T)) S) 1)" | |||
"f(x) = 1 = 2" => "(function-= (call f x) (= 1 2))" # Should be a warning! | |||
# Bad assignment with suffixed op | |||
((v = v"1.12",), "a +₁= b") => "(op= a (error +₁) b)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a version-specific error as implemented
((v = v"1.12",), "a +₁= b") => "(op= a (error +₁) b)" | |
"a +₁= b" => "(op= a (error +₁) b)" |
tokens = tokenize("+₁") | ||
@test length(tokens) == 1 # Just the identifier, endmarker is not included in tokenize() | ||
@test kind(tokens[1]) == K"Identifier" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokens = tokenize("+₁") | |
@test length(tokens) == 1 # Just the identifier, endmarker is not included in tokenize() | |
@test kind(tokens[1]) == K"Identifier" | |
@test tokensplit("+₁") == [K"Identifier"=>"+₁"] |
@testset "dotted and suffixed operators" begin | ||
|
||
for opkind in Tokenize._nondot_symbolic_operator_kinds() | ||
for opkind in _nondot_symbolic_operator_kinds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems incorrect - this test now omits many many operators? But it used to depend on the fact that all the operator kinds were listed individually.
Instead, I guess we should have a big list of all the allowable operators here, separate from the list in Tokenize.
is_syntactic_operator(leading_kind) ? leading_kind : K"Identifier") | ||
|
||
# Check if this is a compound assignment operator pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant comment
# Check if this is a compound assignment operator pattern |
Thanks for the extensive review - I'll wait until the other PR is merged to rebase and fix those up.
I used claude off and on for the big changeset that this was extracted out of (along with several of the earlier PRs). However, I think the things you flagged as most objectionable are not Claude's fault, but rather an artifact of multiple versions of iterations and rebases as the earlier patches in this sequence were cleaned up and put in individually. |
This replaces all the specialized operator heads by a single K"Operator" head that encodes the precedence level in its flags (except for operators that are also used for non-operator purposes). The operators are already K"Identifier" in the final parse tree. There is very little reason to spend all of the extra effort separating them into separate heads only to undo this later. Moreover, I think it's actively misleading, because it makes people think that they can query things about an operator by looking at the head, which doesn't work for suffixed operators.
Additionally, this removes the
op=
token, replacing it by two tokens, one K"Operator" with a special precendence level and one=
. This then removes the last use ofbump_split
(since this PR is on top of #573).As a free bonus this prepares us for having compound assignment syntax for suffixed operators, which was infeasible in the flips parser. That syntax change is not part of this PR but would be trivial (this PR makes it an explicit error).
Fixes #334